AITopics | dynamical isometry

Collaborating Authors

dynamical isometry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Initialization of ReLUs for Dynamical Isometry

Rebekka Burkholz, Alina Dubatovka

Neural Information Processing SystemsFeb-14-2026, 12:41:02 GMT

Neural Information Processing Systems http://nips.cc/

initialization, international conference, neural network, (15 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(9 more...)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Initialization of ReLUs for Dynamical Isometry

Neural Information Processing SystemsDec-26-2025, 01:07:41 GMT

Deep learning relies on good initialization schemes and hyperparameter choices prior to training a neural network. Random weight initializations induce random network ensembles, which give rise to the trainability, training speed, and sometimes also generalization ability of an instance. In addition, such ensembles provide theoretical insights into the space of candidate models of which one is selected during training. The results obtained so far rely on mean field approximations that assume infinite layer width and that study average squared signals. We derive the joint signal output distribution exactly, without mean field assumptions, for fully-connected networks with Gaussian weights and biases, and analyze deviations from the mean field results. For rectified linear units, we further discuss limitations of the standard initialization scheme, such as its lack of dynamical isometry, and propose a simple alternative that overcomes these by initial parameter sharing.

dynamical isometry, initialization, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli

Neural Information Processing SystemsNov-21-2025, 13:08:39 GMT

However, it is unclear how to achieve dynamical isometry in nonlinear deep networks.

artificial intelligence, dynamical isometry, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Initialization of ReLUs for Dynamical Isometry

Rebekka Burkholz, Alina Dubatovka

Neural Information Processing SystemsAug-20-2025, 04:54:47 GMT

Deep learning relies critically on good parameter initialization prior to training.

initialization, international conference, neural network, (15 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(9 more...)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

IDInit: A Universal and Stable Initialization Method for Neural Network Training

Pan, Yu, Wang, Chaozheng, Wu, Zekai, Wang, Qifan, Zhang, Min, Xu, Zenglin

arXiv.org Artificial IntelligenceMar-9-2025

Deep neural networks have achieved remarkable accomplishments in practice. The success of these networks hinges on effective initialization methods, which are vital for ensuring stable and rapid convergence during training. Recently, initialization methods that maintain identity transition within layers have shown good efficiency in network training. These techniques (e.g., Fixup) set specific weights to zero to achieve identity control. However, settings of remaining weight (e.g., Fixup uses random values to initialize non-zero weights) will affect the inductive bias that is achieved only by a zero weight, which may be harmful to training. Addressing this concern, we introduce fully identical initialization (IDInit), a novel method that preserves identity in both the main and sub-stem layers of residual networks. IDInit employs a padded identity-like matrix to overcome rank constraints in non-square weight matrices. Furthermore, we show the convergence problem of an identity matrix can be solved by stochastic gradient descent. Additionally, we enhance the universality of IDInit by processing higher-order weights and addressing dead neuron problems. IDInit is a straightforward yet effective initialization method, with improved convergence, stability, and performance across various settings, including large-scale datasets and deep models.

experiment, idinit, initialization, (16 more...)

arXiv.org Artificial Intelligence

2503.04626

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: Initialization of ReLUs for Dynamical Isometry

Neural Information Processing SystemsJan-27-2025, 11:09:54 GMT

The response did elaborate on the relationship between the approaches to ReLU initialization considered and the earlier portion of the paper - this should be made clearer in the paper. However, as pointed out by the other reviewers, the structure in the proposed Gaussian submatrix initalization has previously been proposed in Balduzzi et al. [2]. It analyzes how signals are transformed through the layers of a feedforward neural network, assuming weights are initialized from Gaussian distributions. Previous work used a mean-field assumption to study these dynamics, and used the results to identify parameters for the Gaussians to ensure stable propagation of the mean of the signal variance through the layers, a necessary condition for training deep networks. This work considers how the distribution of the initial signal variance is transformed through the layers of the network.

initialization, propagation, signal propagation, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)

Add feedback

Initialization of ReLUs for Dynamical Isometry

Neural Information Processing SystemsOct-11-2024, 01:43:27 GMT

dynamical isometry, initialization, initialization scheme, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Reviews: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

Neural Information Processing SystemsOct-8-2024, 10:18:39 GMT

The article is focused on the problem of understanding the learning dynamics of deep neural networks depending on both the activation functions used at the different layers and on the way the weights are initialized. It is mainly a theoretical paper with some experiments that confirm the theoretical study. The core of the contribution is made based on the random matrix theory. In the first Section, the paper describes the setup -- a deep neural network as a sequence of layers -- and also the tools that will be used to study their dynamics. The analysis mainly relies on the study of the singular values density of the jacobian matrix, this density being computed by a 4 step methods proposed in the article.

architecture, random matrix theory, theory and practice, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli

Neural Information Processing SystemsOct-4-2024, 07:28:09 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, dynamical isometry, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization

Nowak, Aleksandra Irena, Gniecki, Łukasz, Szatkowski, Filip, Tabor, Jacek

arXiv.org Artificial IntelligenceJun-3-2024

Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. A key design choice is given by the sparse initialization, which determines the trainable sub-network through a binary mask. Existing methods mainly select such mask based on a predefined dense initialization. Such an approach may not efficiently leverage the mask's potential impact on the optimization. An alternative direction, inspired by research into dynamical isometry, is to introduce orthogonality in the sparse subnetwork, which helps in stabilizing the gradient signal. In this work, we propose Exact Orthogonal Initialization (EOI), a novel sparse orthogonal initialization scheme based on composing random Givens rotations. Contrary to other existing approaches, our method provides exact (not approximated) orthogonality and enables the creation of layers with arbitrary densities. We demonstrate the superior effectiveness and efficiency of EOI through experiments, consistently outperforming common sparse initialization techniques. Our method enables training highly sparse 1000-layer MLP and CNN networks without residual connections or normalization techniques, emphasizing the crucial role of weight initialization in static sparse training alongside sparse mask selection. The code is available at https://github.com/woocash2/sparser-better-deeper-stronger

initialization, matrix, sparse training, (15 more...)

arXiv.org Artificial Intelligence

2406.01755

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Austria > Vienna (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback